# A tibble: 32,561 × 15
age workclass fnlwgt education education.num marital.status occupation
<dbl> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 90 ? 77053 HS-grad 9 Widowed ?
2 82 Private 132870 HS-grad 9 Widowed Exec-mana…
3 66 ? 186061 Some-college 10 Widowed ?
4 54 Private 140359 7th-8th 4 Divorced Machine-o…
5 41 Private 264663 Some-college 10 Separated Prof-spec…
6 34 Private 216864 HS-grad 9 Divorced Other-ser…
7 38 Private 150601 10th 6 Separated Adm-cleri…
8 74 State-gov 88638 Doctorate 16 Never-married Prof-spec…
9 68 Federal-gov 422013 HS-grad 9 Divorced Prof-spec…
10 41 Private 70037 Some-college 10 Never-married Craft-rep…
# ℹ 32,551 more rows
# ℹ 8 more variables: relationship <chr>, race <chr>, sex <chr>,
# capital.gain <dbl>, capital.loss <dbl>, hours.per.week <dbl>,
# native.country <chr>, income <chr>
Plot 2 shows the proportions of race by native country. This is helpful because it gives an indication of the demographic of race, and gives insight on which part of the world the individual came from, and the proportions of race in those parts of the world as well.
Plot 3 shows the proportions of marital status by sex. One shocking result of this plot is how a large proportion of women in the data set were never married. This can be investigated further to look for underlying cause due to ages of participants, or another underlying cause.
Plot 4 shows the proportions of work class by age. Majority of all age groups are involved in a private work class, especially at a younger age where participants might be working for themselves, or have a small business to make money. Another plot that could go along with this is looking at education level against work class to see if high school education is correlated with more private work class levels, etc.
With the plots, I did not remove any indications of missing data marked by the “?” in this plot 4. This is because in this plot the missing data helps with insight on the data set, where missing data regarding work class makes sense because majority of the missing data are for people in the census under 25 years of age, so those participants might not be working yet, or have underlying reasons for not submitting work class information to the adult census.
---
title: "Adult Census"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
logo: download.png
source_code: embed
social: menu
---
```{r}
library(flexdashboard)
library(tidyverse)
library(plotly)
library(knitr)
library(DT)
df <- read_csv('~/Statistical Analysis with R/adult_census.csv')
# Create a ggplot object
plot1 <- df %>%
ggplot()+
geom_bar(mapping=aes(y=education, fill=sex),
position = 'fill')+
labs(y="Education Status", fill="Sex")
plot2 <- df %>%
ggplot()+
geom_bar(mapping=aes(y=native.country, fill=race),
position = 'fill')+
labs(y="Native Country", fill="Race")
plot3 <- df %>%
ggplot()+
geom_bar(mapping=aes(x=sex, fill=marital.status),
position = 'fill')+
labs(x="Sex", fill="Marital Status")
plot4 <- df %>%
ggplot()+
geom_bar(mapping=aes(x=age, fill=workclass),
position = 'dodge')+
labs(x="Age", fill="Work Class")
```
{.sidebar}
=======================================================================
### 1. Census Definition
census. noun. cen·sus. : a usually complete count of a population (as of a state) especially : a periodic governmental count of a population that usually includes social and economic information (as occupations, ages, and incomes)
(Source: https://www.merriam-webster.com/dictionary/census#:~:text=Legal%20Definition-,census,occupations%2C%20ages%2C%20and%20incomes)
### 2. Census Dataset
This census data set consists of 15 variables, and is a mix of both numerical and categorical variables. The target variable is the "income", since the data set is used to predict whether or not the observation's salary has greater than or less than $50k, using the variables and demographics from the adult census data.
### 3. More Information
More information about this dataset can be found at https://archive.ics.uci.edu/dataset/2/adult
Data Tables and Plot 1
=======================================================================
Column {data-width=500, .tabset}
-----------------------------------------------------------------------
### Column Tab 1
```{r}
df
```
### Column Tab 2
```{r}
datatable(df, options = list(
pageLength = 25
))
```
Column {data-width=500}
-----------------------------------------------------------------------
### Row 1
```{r}
plot1
```
### Row 2
```{r}
ggplotly(plot1)
```
Plots 2 and 3
=======================================================================
Column {data-width=500}
-----------------------------------------------------------------------
#### 1. Plot 2
Plot 2 shows the proportions of race by native country. This is helpful because it gives an indication of the demographic of race, and gives insight on which part of the world the individual came from, and the proportions of race in those parts of the world as well.
#### 2. Plot 3
Plot 3 shows the proportions of marital status by sex. One shocking result of this plot is how a large proportion of women in the data set were never married. This can be investigated further to look for underlying cause due to ages of participants, or another underlying cause.
Column {data-width=500}
-----------------------------------------------------------------------
### Row 1
```{r}
ggplotly(plot2)
```
### Row 2
```{r}
ggplotly(plot3)
```
Plot 4
=======================================================================
Column {data-width=500}
-----------------------------------------------------------------------
#### 1. Plot 4
Plot 4 shows the proportions of work class by age. Majority of all age groups are involved in a private work class, especially at a younger age where participants might be working for themselves, or have a small business to make money. Another plot that could go along with this is looking at education level against work class to see if high school education is correlated with more private work class levels, etc.
#### 2. Missing Data
With the plots, I did not remove any indications of missing data marked by the "?" in this plot 4. This is because in this plot the missing data helps with insight on the data set, where missing data regarding work class makes sense because majority of the missing data are for people in the census under 25 years of age, so those participants might not be working yet, or have underlying reasons for not submitting work class information to the adult census.
Column {data-width=500}
-----------------------------------------------------------------------
### Row 1
```{r}
plot4
```
### Row 2
```{r}
ggplotly(plot4)
```